62 research outputs found
Transfer learning of language-independent end-to-end ASR with language model fusion
This work explores better adaptation methods to low-resource languages using
an external language model (LM) under the framework of transfer learning. We
first build a language-independent ASR system in a unified sequence-to-sequence
(S2S) architecture with a shared vocabulary among all languages. During
adaptation, we perform LM fusion transfer, where an external LM is integrated
into the decoder network of the attention-based S2S model in the whole
adaptation stage, to effectively incorporate linguistic context of the target
language. We also investigate various seed models for transfer learning.
Experimental evaluations using the IARPA BABEL data set show that LM fusion
transfer improves performances on all target five languages compared with
simple transfer learning when the external text data is available. Our final
system drastically reduces the performance gap from the hybrid systems.Comment: Accepted at ICASSP201
COMPARISON OF FOOT MORPHOLOGY AND PREFERRED SHOE FOR IMPROVING RUNNING SHOE FITTING
The purpose of this study was to compare the shoe internal space and foot shapes of different type for increasing sense of shoe fitting. 347 healthy subjects (male=160; female=187) without any pathological conditions of the foot participated in this study. 11 pairs of running shoes have different size (230-280mm) with same material and appearances were prepared and the shapes of shoe last were also measured for these shoes. In order evaluating the sense of shoe fitting, 6 fit indicators were analysed by comparing the shape of shoe last with foot morphology. We could find that people with wider feet tend to wear tighter shoes and narrower feet preferred to wear looser shoes that seems to significantly affect by the experience. And the sense of shoe fitting was significant different from gender and foot type which can be used as important data for recommending shoe size and to make customized shoe
Improved Multi-Shot Diffusion-Weighted MRI with Zero-Shot Self-Supervised Learning Reconstruction
Diffusion MRI is commonly performed using echo-planar imaging (EPI) due to
its rapid acquisition time. However, the resolution of diffusion-weighted
images is often limited by magnetic field inhomogeneity-related artifacts and
blurring induced by T2- and T2*-relaxation effects. To address these
limitations, multi-shot EPI (msEPI) combined with parallel imaging techniques
is frequently employed. Nevertheless, reconstructing msEPI can be challenging
due to phase variation between multiple shots. In this study, we introduce a
novel msEPI reconstruction approach called zero-MIRID (zero-shot
self-supervised learning of Multi-shot Image Reconstruction for Improved
Diffusion MRI). This method jointly reconstructs msEPI data by incorporating
deep learning-based image regularization techniques. The network incorporates
CNN denoisers in both k- and image-spaces, while leveraging virtual coils to
enhance image reconstruction conditioning. By employing a self-supervised
learning technique and dividing sampled data into three groups, the proposed
approach achieves superior results compared to the state-of-the-art parallel
imaging method, as demonstrated in an in-vivo experiment.Comment: 10 pages, 4 figure
Scan Specific Artifact Reduction in K-space (SPARK) Neural Networks Synergize with Physics-based Reconstruction to Accelerate MRI
Purpose: To develop a scan-specific model that estimates and corrects k-space
errors made when reconstructing accelerated Magnetic Resonance Imaging (MRI)
data.
Methods: Scan-Specific Artifact Reduction in k-space (SPARK) trains a
convolutional-neural-network to estimate and correct k-space errors made by an
input reconstruction technique by back-propagating from the mean-squared-error
loss between an auto-calibration signal (ACS) and the input technique's
reconstructed ACS. First, SPARK is applied to GRAPPA and demonstrates improved
robustness over other scan-specific models, such as RAKI and residual-RAKI.
Subsequent experiments demonstrate that SPARK synergizes with residual-RAKI to
improve reconstruction performance. SPARK also improves reconstruction quality
when applied to advanced acquisition and reconstruction techniques like 2D
virtual coil (VC-) GRAPPA, 2D LORAKS, 3D GRAPPA without an integrated ACS
region, and 2D/3D wave-encoded images.
Results: SPARK yields 1.5x - 2x RMSE reduction when applied to GRAPPA and
improves robustness to ACS size for various acceleration rates in comparison to
other scan-specific techniques. When applied to advanced reconstruction
techniques such as residual-RAKI, 2D VC-GRAPPA and LORAKS, SPARK achieves up to
20% RMSE improvement. SPARK with 3D GRAPPA also improves performance by ~2x and
perceived image quality without a fully sampled ACS region. Finally, SPARK
synergizes with non-cartesian 2D and 3D wave-encoding imaging by reducing RMSE
between 20-25% and providing qualitative improvements.
Conclusion: SPARK synergizes with physics-based acquisition and
reconstruction techniques to improve accelerated MRI by training scan-specific
models to estimate and correct reconstruction errors in k-space
CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss
This paper considers contrastive training for cross-modal 0-shot transfer
wherein a pre-trained model in one modality is used for representation learning
in another domain using pairwise data. The learnt models in the latter domain
can then be used for a diverse set of tasks in a zero-shot way, similar to
``Contrastive Language-Image Pre-training (CLIP)'' and ``Locked-image Tuning
(LiT)'' that have recently gained considerable attention. Most existing works
for cross-modal representation alignment (including CLIP and LiT) use the
standard contrastive training objective, which employs sets of positive and
negative examples to align similar and repel dissimilar training data samples.
However, similarity amongst training examples has a more continuous nature,
thus calling for a more `non-binary' treatment. To address this, we propose a
novel loss function called Continuously Weighted Contrastive Loss (CWCL) that
employs a continuous measure of similarity. With CWCL, we seek to align the
embedding space of one modality with another. Owing to the continuous nature of
similarity in the proposed loss function, these models outperform existing
methods for 0-shot transfer across multiple models, datasets and modalities.
Particularly, we consider the modality pairs of image-text and speech-text and
our models achieve 5-8% (absolute) improvement over previous state-of-the-art
methods in 0-shot image classification and 20-30% (absolute) improvement in
0-shot speech-to-intent classification and keyword classification.Comment: Accepted to Neural Information Processing Systems (NeurIPS) 2023
conferenc
- …